Recursive alignment block classification technique for word reordering in statistical machine translation
نویسندگان
چکیده
Statistical machine translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantage of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotonic translation. Afterwards, we classify these pairs into groups, following recursively a co-occurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test) which belong to the same group. We swap them and we use the modified source training corpora to realign and to build the final translation system. We have evaluated our reordering approach both in alignment and translation quality. In addition, we have used two state-of-the-art SMT systems: a Phrased-based and an Ngram-based. Experiments are reported on the EuroParl task, showing improvements almost over 1 point in the standard MT evaluation metrics (mWER and BLEU). M. R. Costa-jussà (&) Barcelona Media Innovation Center, Av. Diagonal 177, 08018 Barcelona, Spain e-mail: [email protected] J. A. R. Fonollosa E. Monte Universitat Politècnica de Catalunya, TALP Research Center, Jordi Girona 1-3, 08034 Barcelona, Spain J. A. R. Fonollosa e-mail: [email protected] E. Monte e-mail: [email protected] 123 Lang Resources & Evaluation (2011) 45:165–179 DOI 10.1007/s10579-010-9133-9
منابع مشابه
Using Reordering in Statistical Machine Translation based on Recursive Alignment Block Classification
Statistical Machine Translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between the source and target language. These models are assumed to learn word reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. This paper proposes a Recursive Alignment Block Classification ...
متن کاملUsing Reordering in Statistical Machine Translation based on Alignment Block Classification
Statistical Machine Translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between the source and target language. These models are assumed to learn word reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. This paper proposes a Recursive Alignment Block Classification ...
متن کاملWord Alignment-Based Reordering of Source Chunks in PB-SMT
Reordering poses a big challenge in statistical machine translation between distant language pairs. The paper presents how reordering between distant language pairs can be handled efficiently in phrase-based statistical machine translation. The problem of reordering between distant languages has been approached with prior reordering of the source text at chunk level to simulate the target langu...
متن کاملNeural Reordering Model Considering Phrase Translation and Word Alignment for Phrase-based Translation
This paper presents an improved lexicalized reordering model for phrase-based statistical machine translation using a deep neural network. Lexicalized reordering suffers from reordering ambiguity, data sparseness and noises in a phrase table. Previous neural reordering model is successful to solve the first and second problems but fails to address the third one. Therefore, we propose new featur...
متن کاملIterative reordering and word alignment for statistical MT
Word alignment is necessary for statistical machine translation (SMT), and reordering as a preprocessing step has been shown to improve SMT for many language pairs. In this initial study we investigate if both word alignment and reordering can be improved by iterating these two steps, since they both depend on each other. Overall no consistent improvements were seen on the translation task, but...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Language Resources and Evaluation
دوره 45 شماره
صفحات -
تاریخ انتشار 2011